For a long time, R has had a relatively simple mechanism, via the maps package, for making simple outlines of maps and plotting lat-long points and paths on them.
More recently, with the advent of packages like sp, rgdal, and rgeos, R has been acquiring much of the functionality of traditional GIS packages (like ArcGIS, etc). This is an exciting development, but not always easily accessible for the beginner, as it requires installation of specialized external libraries (that may, on some platforms, not be straightforward) and considerable familiarity with GIS concepts.
More recently, a third approach to convenient mapping, using ggmap has been developed that allows the tiling of detailed base maps from Google Earth or Open Street Maps, upon which spatial data may be plotted. Today, we are going to focus on mapping using base maps from R’s tried and true maps package and also using the ggmap package. We won’t cover the more advanced GIS-related topics nor using rgdal, or sp to plot maps with different projections, etc. Nor will cover the somewhat more simplified approach to projections using the mapproj package.
As in our previous explorations in this course, when it comes to plotting, we are going to completely skip over R’s base graphics system and head directly to Hadley Wickham’s ggplot2 package. Hadley has included a few functions that make it relatively easy to interact with the data in R’s maps package, and of course, once a map layer is laid down, you have all the power of ggplot at your fingertips to overlay whatever you may want to over the map. ggmap is a package that goes out to different map servers and grabs base maps to plot things on, then it sets up the coordinate system and writes it out as the base layer for further ggplotting. It is pretty sweet, but does not support different projections.
maps package
ggplot2 can deal withggplot2 related issues about plotting things.ggmap to make some pretty decent looking mapsI feel that the above twp topics should cover a large part of what people will need for making useful maps of field sites, or sampling locations, or fishing track lines, etc.
For today we will be skipping how to read in traditional GIS “shapefiles” so as to minimize the number of packages that need installation, but keep in mind that it isn’t too hard to do that in R, too.
maps package contains a lot of outlines of continents, countries, states, and counties that have been with R for a long time.mapdata package contains a few more, higher-resolution outlines.maps package comes with a plotting function, but, we will opt to use ggplot2 to plot the maps in the maps package.ggplot2 operates on data frames. Therefore we need some way to translate the maps data into a data frame format that ggplot can use.maps provides lots of different map outlines and points for cities, etc.usa, nz, state, world, etc.ggplot2 provides the map_data() function.
map_data("name") where “name” is a quoted string of the name of a map in the maps or mapdata packagemaps:usa <- map_data("usa")
dim(usa)
## [1] 7243 6
head(usa)
## long lat group order region subregion
## 1 -101.4078 29.74224 1 1 main <NA>
## 2 -101.3906 29.74224 1 2 main <NA>
## 3 -101.3620 29.65056 1 3 main <NA>
## 4 -101.3505 29.63911 1 4 main <NA>
## 5 -101.3219 29.63338 1 5 main <NA>
## 6 -101.3047 29.64484 1 6 main <NA>
tail(usa)
## long lat group order region subregion
## 7247 -122.6187 48.37482 10 7247 whidbey island <NA>
## 7248 -122.6359 48.35764 10 7248 whidbey island <NA>
## 7249 -122.6703 48.31180 10 7249 whidbey island <NA>
## 7250 -122.7218 48.23732 10 7250 whidbey island <NA>
## 7251 -122.7104 48.21440 10 7251 whidbey island <NA>
## 7252 -122.6703 48.17429 10 7252 whidbey island <NA>
mapdataw2hr <- map_data("world2Hires")
dim(w2hr)
## [1] 2274539 6
head(w2hr)
## long lat group order region subregion
## 1 226.6336 58.42416 1 1 Canada <NA>
## 2 226.6314 58.42336 1 2 Canada <NA>
## 3 226.6122 58.41196 1 3 Canada <NA>
## 4 226.5911 58.40027 1 4 Canada <NA>
## 5 226.5719 58.38864 1 5 Canada <NA>
## 6 226.5528 58.37724 1 6 Canada <NA>
tail(w2hr)
## long lat group order region subregion
## 2276817 125.0258 11.18471 2284 2276817 Philippines Leyte
## 2276818 125.0172 11.17142 2284 2276818 Philippines Leyte
## 2276819 125.0114 11.16110 2284 2276819 Philippines Leyte
## 2276820 125.0100 11.15555 2284 2276820 Philippines Leyte
## 2276821 125.0111 11.14861 2284 2276821 Philippines Leyte
## 2276822 125.0155 11.13887 2284 2276822 Philippines Leyte
These are pretty straightforward:
long is longitude. Things to the west of the prime meridian are negative.lat is latitude.order. This just shows in which order ggplot should “connect the dots”region and subregion tell what region or subregion a set of points surrounds.group. This is very important! ggplot2’s functions can take a group argument which controls (amongst other things) whether adjacent points should be connected by lines. If they are in the same group, then they get connected, but if they are in different groups then they don’t.
ggplot “lifts the pen” when going between them.geom_polygon().geom_polygon() drawn lines between points and “closes them up” (i.e. draws a line from the last point back to the first point)group aesthetic to the group columnx = long and y = lat are the other aesthetics.By default, geom_polygon() draws with no line color, but with a black fill:
usa <- map_data("usa") # we already did this, but we can do it again
ggplot() +
geom_polygon(data = usa, aes(x=long, y = lat, group = group)) +
coord_fixed(1.3)
ggsave for example)), the aspect ratio remains unchanged.aes function.ggplot(usa) +
geom_polygon(aes(x = long, y = lat, group = group),
fill = NA, color = "red") +
coord_fixed(1.3)
gg1 <- ggplot(usa) +
geom_polygon(aes(x = long, y = lat, group = group),
fill = "violet", color = "blue") +
coord_fixed(1.3)
gg1
labs <- data.frame(
long = c(-122.064873, -122.306417),
lat = c(36.951968, 47.644855),
names = c("SWFSC-FED", "NWFSC"),
stringsAsFactors = FALSE
)
gg1 +
geom_point(data = labs, aes(x = long, y = lat), color = "black", size = 5) +
geom_point(data = labs, aes(x = long, y = lat), color = "yellow", size = 4)
To show how important the group aesthetic is, here we plot the map without using the group aesthetic:
ggplot() +
geom_polygon(data = usa, aes(x = long, y = lat), fill = "violet", color = "blue") +
geom_point(data = labs, aes(x = long, y = lat), color = "black", size = 5) +
geom_point(data = labs, aes(x = long, y = lat), color = "yellow", size = 4) +
coord_fixed(1.3)
We can also get a data frame of polygons that tell us above state boundaries:
states <- map_data("state")
dim(states)
## [1] 15537 6
head(states)
## long lat group order region subregion
## 1 -87.46201 30.38968 1 1 alabama <NA>
## 2 -87.48493 30.37249 1 2 alabama <NA>
## 3 -87.52503 30.37249 1 3 alabama <NA>
## 4 -87.53076 30.33239 1 4 alabama <NA>
## 5 -87.57087 30.32665 1 5 alabama <NA>
## 6 -87.58806 30.32665 1 6 alabama <NA>
tail(states)
## long lat group order region subregion
## 15594 -106.3295 41.00659 63 15594 wyoming <NA>
## 15595 -106.8566 41.01232 63 15595 wyoming <NA>
## 15596 -107.3093 41.01805 63 15596 wyoming <NA>
## 15597 -107.9223 41.01805 63 15597 wyoming <NA>
## 15598 -109.0568 40.98940 63 15598 wyoming <NA>
## 15599 -109.0511 40.99513 63 15599 wyoming <NA>
This is just like it is above, but we can map fill to region and make sure the the lines of state borders are white.
ggplot(states) +
geom_polygon(aes(x = long, y = lat, fill = region, group = group), color = "white") +
coord_fixed(1.3) +
guides(fill = FALSE) # do this to leave off the color legend
Boom! That is easy.
west_coast <- filter(states, region %in% c("california", "oregon", "washington"))
ggplot(west_coast) +
geom_polygon(aes(x = long, y = lat), fill = "palegreen", color = "black")
groupcoord_fixed()ggplot(data = west_coast) +
geom_polygon(aes(x = long, y = lat, group = group), fill = "palegreen", color = "black") +
coord_fixed(1.3)
Phew! That is a little better!
ca_df <- filter(states, region == "california")
head(ca_df)
## long lat group order region subregion
## 1 -120.0060 42.00927 4 667 california <NA>
## 2 -120.0060 41.20139 4 668 california <NA>
## 3 -120.0060 39.70024 4 669 california <NA>
## 4 -119.9946 39.44241 4 670 california <NA>
## 5 -120.0060 39.31636 4 671 california <NA>
## 6 -120.0060 39.16166 4 672 california <NA>
counties <- map_data("county")
ca_county <- filter(counties, region == "california")
head(ca_county)
## long lat group order region subregion
## 1 -121.4785 37.48290 157 6965 california alameda
## 2 -121.5129 37.48290 157 6966 california alameda
## 3 -121.8853 37.48290 157 6967 california alameda
## 4 -121.8968 37.46571 157 6968 california alameda
## 5 -121.9254 37.45998 157 6969 california alameda
## 6 -121.9483 37.47717 157 6970 california alameda
theme_nothing().ca_base <- ggplot(ca_df, aes(x = long, y = lat, group = group)) +
coord_fixed(1.3) +
geom_polygon(color = "black", fill = "gray")
ca_base + theme_nothing()
ca_base + theme_nothing() +
geom_polygon(data = ca_county, fill = NA, color = "white") +
geom_polygon(color = "black", fill = NA) # get the state border back on top
# scrape population data
county_html <- read_html("https://en.wikipedia.org/wiki/List_of_counties_in_California")
pop_and_area <- county_html %>%
html_node(".wikitable.sortable") %>%
html_table() %>%
select(county = County, population = `Population[6]`, area = `Area[4]`) %>%
mutate(county = county %>%
str_replace_all(c("City and County of San Francisco" = "San Francisco",
" County" = "")) %>%
str_to_lower,
population = population %>%
str_replace_all(".*\u2660", "") %>% parse_number,
area = area %>%
str_replace_all(".*\u2660|\n.*", "") %>% parse_number)
inner_join.cacopa <- inner_join(ca_county, pop_and_area, by = c("subregion" = "county"))
people_per_mile:cacopa <- mutate(cacopa, people_per_mile = population / area)
head(cacopa)
## long lat group order region subregion population area
## 1 -121.4785 37.48290 157 6965 california alameda 1638215 738
## 2 -121.5129 37.48290 157 6966 california alameda 1638215 738
## 3 -121.8853 37.48290 157 6967 california alameda 1638215 738
## 4 -121.8968 37.46571 157 6968 california alameda 1638215 738
## 5 -121.9254 37.45998 157 6969 california alameda 1638215 738
## 6 -121.9483 37.47717 157 6970 california alameda 1638215 738
## people_per_mile
## 1 2219.804
## 2 2219.804
## 3 2219.804
## 4 2219.804
## 5 2219.804
## 6 2219.804
If you were needing a little more elbow room in the great Golden State, this shows you where you can find it:
# prepare to drop the axes and ticks but leave the guides and legends
# We can't just throw down a theme_nothing()!
ditch_the_axes <- theme(
axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank(),
axis.title = element_blank()
)
elbow_room1 <- ca_base +
geom_polygon(data = cacopa, aes(fill = people_per_mile), color = "white") +
geom_polygon(color = "black", fill = NA) +
theme_bw() +
ditch_the_axes
elbow_room1
people_per_mile we can just apply the transformation in the gradient using the trans argumentelbow_room1 + scale_fill_gradient(trans = "log10")
It’s very common for color to represent a dimension of your data in spatial visualizations. As such, you should think about the color palette carefully. Here, contrasting colors are used to show the gamut of population density in California.
eb2 <- elbow_room1 +
scale_fill_gradient(low = "#0C374D", high = "#EFD469", trans = "log10")
eb2
Note that the scale of these maps from package maps are not great. We can zoom in to the Bay region, and it sort of works scale-wise, but if we wanted to zoom in more, it would be tough.
Let’s try!
eb2 + xlim(-123, -121.0) + ylim(36, 38)
geom_polygon() connects the end point of a group to its starting point.xlim and ylim functions in ggplot2 discard all the data that is not within the plot area.
xlim and ylim arguments to coord_cartesian(). Though, to keep the aspect ratio correct we must use coord_fixed() instead of coord_cartesian().eb2 + coord_fixed(xlim = c(-123, -121.0),
ylim = c(36, 38),
ratio = 1.3)
The ggmap package is the most exciting R mapping tool in a long time. You might be able to get better looking maps at some resolutions by using shapefiles and rasters from naturalearthdata.com but ggmap will get you 95% of the way there with only 5% of the work!
ggmap() much as you would with ggplot()sisquoc <- read_tsv("data/sisquoc-points.txt")
sisquoc
## # A tibble: 8 x 3
## name lon lat
## <chr> <dbl> <dbl>
## 1 a17 -119.7603 34.75474
## 2 a20-24 -119.7563 34.75380
## 3 a25-28 -119.7537 34.75371
## 4 a18,19 -119.7573 34.75409
## 5 a35,36 -119.7467 34.75144
## 6 a31 -119.7478 34.75234
## 7 a38 -119.7447 34.75230
## 8 a43 -119.7437 34.75251
Let’s try using the zoom level. Zoom levels go from 3 (world scale to 20 (house scale)).
# compute the mean lat and lon
ll_means <- summarise_each(sisquoc, "mean", lon, lat) %>% as.numeric(.[1, ])
sq_map2 <- get_map(location = ll_means, maptype = "satellite",
source = "google", zoom = 15)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=34.753117,-119.751324&zoom=15&size=640x640&scale=2&maptype=satellite&language=en-EN&sensor=false
ggmap(sq_map2) +
geom_point(data = sisquoc, color = "red", size = 4) +
geom_text(data = sisquoc, aes(label = name),
angle = 60, hjust = -.25, color = "yellow")
* That is decent. How about if we use the “terrain” type of map:
sq_map3 <- get_map(location = ll_means, maptype = "terrain",
source = "google", zoom = 15)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=34.753117,-119.751324&zoom=15&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
ggmap(sq_map3) +
geom_point(data = sisquoc, color = "red", size = 4) +
geom_text(data = sisquoc, aes(label = name),
angle = 60, hjust = -.25, color = "yellow")
bike <- read_csv("data/bike-ride.csv")
head(bike)
## # A tibble: 6 x 4
## lon lat elevation time
## <dbl> <dbl> <dbl> <time>
## 1 -122.0646 36.95144 15.8 2011-12-08 19:37:56
## 2 -122.0646 36.95191 15.5 2011-12-08 19:37:59
## 3 -122.0645 36.95201 15.4 2011-12-08 19:38:04
## 4 -122.0645 36.95218 15.5 2011-12-08 19:38:07
## 5 -122.0643 36.95224 15.7 2011-12-08 19:38:10
## 6 -122.0642 36.95233 15.8 2011-12-08 19:38:13
bikemap1 <- get_map(location = c(-122.080954, 36.971709),
maptype = "terrain", source = "google", zoom = 14)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=36.971709,-122.080954&zoom=14&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
ggmap(bikemap1) +
geom_path(data = bike, aes(color = elevation), size = 3, lineend = "round")
In this example, we use fishery data from Eric Anderson of NOAA. He whittled down some stuff in the coded wire tag data base to georeferenced marine locations in British Columbia where at least one Chinook salmon was recovered in between 2000 and 2012 inclusive. To see how he did all that you can check out this
Let’s have a look at the data:
bc <- readRDS("data/bc_sites.rds")
# look at some of it:
bc %>% select(state_or_province:sub_location, longitude, latitude)
## # A tibble: 1,113 x 9
## state_or_province water_type sector region area location sub_location
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 2 M S 22 016 THOR IS 01
## 2 2 M N 26 012 MITC BY 18
## 3 2 M S 22 015 HARW IS 02
## 4 2 M N 26 006 HOPK PT 01
## 5 2 M S 23 017 TENT IS 06
## 6 2 M S 28 23A NAHM BY 02
## 7 2 M N 26 006 GIL IS 06
## 8 2 M S 27 024 CLEL IS 06
## 9 2 M S 27 23B SAND IS 04
## 10 2 M N 26 012 DUVA IS 16
## # ... with 1,103 more rows, and 2 more variables: longitude <dbl>,
## # latitude <dbl>
So, we have 1,113 points to play with.
bc %>% group_by(sector, region, area) %>% tally()
## Source: local data frame [42 x 4]
## Groups: sector, region [?]
##
## sector region area n
## <chr> <chr> <chr> <int>
## 1 48 008 1
## 2 48 028 1
## 3 48 311 1
## 4 N 25 001 33
## 5 N 25 003 15
## 6 N 25 004 44
## 7 N 25 02E 2
## 8 N 25 02W 34
## 9 N 26 006 28
## 10 N 26 007 23
## # ... with 32 more rows
make_bbox(). This makes a bounding box rather than relying on the zoom feature. It’s an experimental feature that would be cool if it worked reliably, but unfortunately, it’s known to fail.# compute the bounding box
bc_bbox <- make_bbox(lat = latitude, lon = longitude, data = bc)
bc_bbox
## left bottom right top
## -133.63297 47.92497 -122.33652 55.80833
# grab the maps from google
bc_big <- get_map(location = bc_bbox, source = "google", maptype = "terrain")
## Warning: bounding box given to google - spatial extent only approximate.
## converting bounding box to center/zoom specification. (experimental)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=51.86665,-127.98475&zoom=6&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
# plot the points and color them by sector
ggmap(bc_big) +
geom_point(data = bc,
mapping = aes(x = longitude, y = latitude, color = sector))
ggmap(bc_big) +
geom_point(data = bc, mapping = aes(x = longitude, y = latitude, color = region))
make_bbox() might fail…)region_plot <- function(my_region) {
region_data <- bc %>% filter(region == my_region)
bbox <- make_bbox(lon = longitude, lat = latitude, data = region_data)
my_map <- get_map(location = bbox, source = "google", maptype = "terrain")
# now we want to count up how many areas there are
num_areas <- region_data %>% summarise(n_distinct(area)) %>% as.numeric
num_points <- nrow(region_data)
the_map <- ggmap(my_map) +
geom_point(data = region_data,
mapping = aes(x = longitude, y = latitude),
size = 4, color = "black") +
geom_point(data = region_data,
mapping = aes(x = longitude,
y = latitude, color = area), size = 3) +
ggtitle(sprintf("BC Region: %s with %d locations in %d area(s)",
my_region, num_points, num_areas))
ggsave(sprintf("bc_region%s.pdf", my_region), the_map, width = 9, height = 9)
}
So, with that function we just need to cycle over the regions and make all those plots. Note that I am saving them to PDFs because it is no fun to make a web page with all of those in there.
# invisible suppresses lapply's list output, which we're not interested in
bc$region %>%
unique %>%
lapply(region_plot) %>%
invisible